Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition
نویسندگان
چکیده
منابع مشابه
Bispectrum Estimators for Voice Activity Detection and Speech Recognition
A new Bispectra Analysis application is presented is this paper. A set of bispectrum estimators for robust and effective voice activity detection (VAD) algorithm are proposed for improving speech recognition performance in noisy environments. The approach is based on filtering the input channel to avoid high energy noisy components and then the determination of the speech/non-speech bispectra b...
متن کاملTwo-layered audio-visual integration in voice activity detection and automatic speech recognition for robots
Automatic Speech Recognition (ASR) which plays an important role in human-robot interaction should be noise-robust because robots are expected to work in noisy environments. Audio-Visual (AV) integration is one of the key ideas to improve the robustness in such environments. This paper proposes two-layered AV integration for ASR which applies AV integration to Voice Activity Detection (VAD) and...
متن کاملA robust audio-visual speech recognition using audio-visual voice activity detection
This paper proposes a novel speech recognition method combining Audio-Visual Voice Activity Detection (AVVAD) and Audio-Visual Automatic Speech Recognition (AVASR). AVASR has been developed to enhance the robustness of ASR in noisy environments, using visual information in addition to acoustic features. Similarly, AVVAD increases the precision of VAD in noisy conditions, which detects presence ...
متن کاملAn Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition
Noise-robust Automatic Speech Recognition (ASR) is essential for robots which are expected to communicate with humans in a daily environment. In such an environment, Voice Activity Detection (VAD) strongly affects the performance of ASR because there are many acoustically and visually noises. In this paper, we improved Audio-Visual VAD for our two-layered audio visual integration framework for ...
متن کاملReal-time audio-visual voice activity detection for speech recognition in noisy environments
Voice activity detection (VAD) is one of the most critical issues on performance degradation of speech recognition in noisy environment applications. A real-time VAD was developed by using face parameters (eye and lip contours) as a front-end for the traditional speech and noise (audio) GMMbased method. Speech recognition performance of the audiovisual VAD is shown to be comparable with audio-o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Journal of the Acoustical Society of Korea
سال: 2015
ISSN: 1225-4428
DOI: 10.7776/ask.2015.34.4.321